Complete genome sequence of Sulfurimonas autotrophica type strain (OK10T)

Sulfurimonas autotrophica Inagaki et al. 2003 is the type species of the genus Sulfurimonas. This genus is of interest because of its significant contribution to the global sulfur cycle as it oxidizes sulfur compounds to sulfate and by its apparent habitation of deep-sea hydrothermal and marine sulfidic environments as potential ecological niche. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second complete genome sequence of the genus Sulfurimonas and the 15th genome in the family Helicobacteraceae. The 2,153,198 bp long genome with its 2,165 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain OK10 T (= DSM 16294 = ATCC BAA-671 = JCM 11897) is the type strain of Sulfurimonas autotrophica [1], which is the type species of its genus Sulfurimonas [1,2]. Together with S. paralvinellae and S. denitrificans, the latter of which was formerly classified as Thiomicrospira denitrificans [3]. There are currently three validly named species in the genus Sulfurimonas [4,5]. The autotrophic and mixotrophic sulfur-oxidizing bacteria such as the members of the genus Sulfurimonas are believed to contribute significantly to the global sulfur cycle [6]. The genus name derives from the Latin word 'sulphur', and the Greek word 'monas', meaning a unit, in order to indicate a "sulfur-oxidizing rod" [1]. The species epithet derives from the Greek word 'auto', meaning self, and from the Greek adjective 'trophicos' meaning nursing, tending or feeding, in order to indicate its autotrophy [1]. S. autotrophica strain OK10 T , like S. paralvinellae strain GO25 T (= DSM 17229), was isolated from the surface of a deep-sea hydrothermal sediment on the Hatoma Knoll in the Mid-Okinawa Trough hydrothermal field [1,2]. Thus, the members of the genus Sulfurimonas appear to be free living, whereas the other members of the family Helicobacteraceae, the genera Helicobacter and Wolinella, appear to be strictly associated with the human stomach and the bovine rumen, respectively. Here we present a summary classification and a set of features for S. autotrophica OK10 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
There exist currently no experimental reports that indicate further cultivated strains of this species. The type strains of S. denitrificans and S. paralvinellae share 93.5% and 96.3% 16S rRNA gene sequence similarity with strain OK10 T . Further analysis also revealed that strain OK10 T shares high similarity (99.1%) with the uncultured clone sequence PVB-12 (U15104) obtained from a mi-crobial mat near the deep-sea hydrothermal vent in the Loihi Seamont, Hawaii [7]. This further corroborates the distribution of S. autotrophica in hydrothermal vents. The 16S rRNA gene sequence similarities of strain OK10 T to metagenomic libraries (env_nt) were 87% or less, indicating the absence of further members of the species in the environments screened so far (status August 2010). Figure 1 shows the phylogenetic neighborhood of S. autotrophica OK10 T in a 16S rRNA based tree. The sequences of the four 16S rRNA gene copies in the genome differ from each other by up to four nucleotides, and differ by up to three nucleotides from the previously published sequence (AB088431). Phylogenetic tree highlighting the position of S. autotrophica OK10 T relative to the type strains of the other species within the genus and the type strains of the other genera within the order Campylobacterales. The tree was inferred from 1,327 aligned characters [8,9] of the 16S rRNA gene sequence under the maximum likelihood criterion [10] and rooted in accordance with current taxonomy [11]. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 350 bootstrap replicates [12] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [13] are shown in blue, published genomes in bold [14,15], such as the recently published GEBA genomes from Sulfurospirillum deleyianum [16] and Arcobacter nitrofigilis [17].
The cells of strain OK10 T are Gram-negative, occasionally slightly curved rods of 1.5-2.5 × 0.5-1.0 µm ( Figure 2 and Table 1) [1]. On solid medium, the cells form white colonies [1]. Under optimal conditions, the generation time of S. autotrophica strain OK10 T is approximately 1.4 h [1,2]. The reductive tricarboxylic acid (rTCA) cycle for autotrophic CO2 fixation is present in strain OK10 T , as shown by PCR amplification of the respective genes [28]. Moreover, the activities of several rTCA key enzymes (ACL, ATP dependent citrate lyase; POR, pyruvate:acceptor oxidoreductase; OGOR, 2-oxoglutarase:accecptor oxidoreductase; ICDH, isocytrate dehydrogenase) have been determined, also in comparison to S. paralvinellae and S. denitrificans [28]. There were no enzyme activities for the phosphoenolpyruvate and ribulose 1,5-bisphosphate (Calvin-Benson) pathways detected in strain OK10 T [28], though the latter is apparently active in S. thermophila [28]. Also, so-luble hydrogenase activity was not found in strain OK10 T [28]. With respect to sulfur oxidation, enzyme activity for SOR (sulfite oxidoreductase) but not for APSR (adenosine 5′-phosphate sulfate reductase) and TSO (thiosulfate-oxidizing enzymes) were detected [28]. A detailed comparison of these enzyme activities to S. paralvinellae and S. denitrificans is given in Takai et al. [28]. Elemental sulfur, thiosulfate or sulfide is utilized as the sole electron donor for chemolithoautotrophic growth with O2 as electron acceptor. Thereby thiosulfate is oxidized to sulfate [1]. Organic substrates and H2 are not utilized as electron donors and only oxygen is utilized as an electron acceptor [28]. Strain OK10 T requires 4% sea salt for growth [1] and is not able to reduce nitrate [2].

MIGS-4.4 Altitude not reported NAS
Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [27]. If the evidence code is IDA, then it was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [31], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [32]. The genome project is deposited in the Genome OnLine Database [13] and the complete genome sequence is deposited in Gen-Bank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2. at 24°C. DNA was isolated from 0.5-1 g of cell paste using MasterPure Gram Positive DNA Purification Kit (Epicenter MGP04100) following the standard protocol as recommended by the manufacturer, with modification st/LALM for cell lysis as described in Wu et al. [32].

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger, 454 and Illumina sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website. Illumina sequencing data was assembled with VELVET [34], and the consensus sequences were shredded into 1.5 kb overlapped fake reads and used for the assembly with 454 and Sanger data. Contigs resulting from a 454 Newbler (2.0.00.20-PostRelease-11-05-2008-gcc-3.4.6) assembly were shredded into 2 kb fake reads, which were assembled with Sanger data. The Phred/Phrap/Consed software package was used for sequence assembly and quality assessment. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification (Roche Applied Science, Indianapolis, IN) [35]. A total of 790 additional custom primer reactions were necessary to close gaps and to raise the quali-ty of the finished sequence. Illumina reads were also used to improve the final consensus quality using an in-house developed tool -the Polisher [36]. Together, the combination of the Illumina and 454 sequencing platforms provided 155.4 × coverage of the genome. The error rate of the completed genome sequence is less than 1 in 100,000.

Genome annotation
Genes were identified using Prodigal [37] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [38]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [39].

Genome properties
The genome consists of a 2,153,198 bp long chromosome with a 35.2% GC content (Table 3 and Figure 3). Of the 2,220 genes predicted, 2,165 were protein-coding genes, and 55 RNAs; seven pseudogenes were also identified. The majority of the protein-coding genes (69.1%) were assigned with a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.